A key aspect of accessibility to parks is the distance between parks and neighborhoods. There are two different ways to calculate distance:

Both measurements have limitiations. Euclidean distances are particularly problematic when there are rivers, interstates, or other impassible features that substantially lengthen the distance people travel to get between points. On the other hand, there is not lots of data suggesting that people perceive network travel times very well when they are choosing destinations. We originally estimated the models for this paper with Euclidean distances, but our reviewers suggested we examine network-based distances as well.

Another issue the reviewers identified is the spatial resolution of the analysis. We had used tracts, because this is where the socioeconomic and health data are available. But block groups are more geographically precise, and it is possible for us to compute the accessibility at that level, if we assume that the socioeconomics and health information is constant across the tract (this is not a good assumption, but we can entertain it for the purposes of the response to reviewers).

We will use the same parks dataset we assembled previously.

open_spaces <- read_rds("data/open_spaces.rds")

Euclidean Distances

The geometric centroid of the tract / block group is not a good approximation of the center point of the tract from an accessibility standpoint. Thus, we get the population-weighted centroid for tracts and block groups from Census.

tracts_url <- "https://www2.census.gov/geo/docs/reference/cenpop2010/tract/CenPop2010_Mean_TR36.txt"

tract_centroids <- read_csv(tracts_url) %>%
  filter(COUNTYFP %in% counties) %>%
  transmute(geoid = str_c(STATEFP, COUNTYFP, TRACTCE), LATITUDE, LONGITUDE) %>%
  # convert to sf and project into the feet-based projection used in the
  # parks dataset.
  st_as_sf(crs = 4326, coords = c("LONGITUDE", "LATITUDE")) %>%
  st_transform(3628) 
## Parsed with column specification:
## cols(
##   STATEFP = col_double(),
##   COUNTYFP = col_character(),
##   TRACTCE = col_character(),
##   POPULATION = col_double(),
##   LATITUDE = col_double(),
##   LONGITUDE = col_double()
## )
bg_url <- "https://www2.census.gov/geo/docs/reference/cenpop2010/blkgrp/CenPop2010_Mean_BG36.txt"

bg_centroids <- read_csv(bg_url) %>%
  filter(COUNTYFP %in% counties) %>%
  transmute(geoid = str_c(STATEFP, COUNTYFP, TRACTCE, BLKGRPCE), LATITUDE, LONGITUDE) %>%
  # convert to sf and project into the feet-based projection used in the
  # parks dataset.
  st_as_sf(crs = 4326, coords = c("LONGITUDE", "LATITUDE")) %>%
  st_transform(3628) 
## Parsed with column specification:
## cols(
##   STATEFP = col_double(),
##   COUNTYFP = col_character(),
##   TRACTCE = col_character(),
##   BLKGRPCE = col_double(),
##   POPULATION = col_double(),
##   LATITUDE = col_double(),
##   LONGITUDE = col_double()
## )

The st_distance() function from the sf library is able to compute the Euclidean distance between two sets of shapes, and gets the distance between points and edges of polygons directly. We calculate the distance in miles, and constrain that the minimum distance is 0.1 miles (no one lives in a park).

distance_table <- function(points, poly){
  dists <- st_distance(points, poly, by_element = FALSE) %>%
    units::set_units(miles) %>%
    units::drop_units()
  
  dists <- pmax(dists, 0.1)
  
  rownames(dists) <- points$geoid
  colnames(dists) <- poly$id
  
  as_tibble(dists, rownames = "geoid") %>%
    pivot_longer(cols = -geoid, names_to = "park_id", values_to = "distance")
}

dist_tr_euc <- distance_table(tract_centroids, open_spaces)
dist_bg_euc <- distance_table(bg_centroids,    open_spaces)
list("tracts" = dist_tr_euc, "blockgroups" = dist_bg_euc) %>%
  write_rds("data/distances.rds")

Network Distances

To calculate network-based distances between block group / tract centroids and park boundaries, we employ the osmnx library for Python. This is a relatively new package that allows users to retrieve networks from OpenStreetMap and compute shortest paths using Python’s other network libraries. We retrieve a walk network for New York City, and re-project it to UTM zone 18N.

graph = ox.graph_from_place("New York City", network_type='walk')
# project to UTM zone 18 N and simplify
graph_proj = ox.project_graph(graph)
graph_proj = nx.DiGraph(graph_proj)

One limitation of the network calculation is that it only works for nodes on the network, and we can only find nearest nodes to points. Thus we need to convert the park polygons data into points. But we can’t simply have an endless number of points because the shortest path calculations are computationally extensive.

So to make this work, we first simplify the parks polygons, convert the boundaries to linestrings, and then sample points along the linestring.

# get park boundary polygon
park_boundaries <- open_spaces %>%
  select(id)  %>%
  
  # some polygons are way too detailed, so we want to simplify to 100 foot resolution
  st_simplify(dTolerance = 100, preserveTopology = TRUE) %>%
  
  # need to convert everything to multipolygons to get multiple rows per shape
  st_cast("MULTIPOLYGON") %>%
  
  st_cast("POLYGON") %>%

  # cast the polygons to a linestring of the park perimeter
  st_cast("LINESTRING", group_or_split = TRUE) 
## Warning in st_cast.sf(., "POLYGON"): repeating attributes for all sub-geometries
## for which they may not be constant
## Warning in st_cast.sf(., "LINESTRING", group_or_split = TRUE): repeating
## attributes for all sub-geometries for which they may not be constant
point_samples <- park_boundaries %>%
  # sample points along the line, one point per 500 feet.
  st_line_sample(density = 1/500) 

# append open space id and coerce to single point per row
park_points <- st_sf(id = park_boundaries$id, geometry = point_samples)  %>%
  st_as_sf() %>%
  st_cast(to = "POINT")
## Warning in st_cast.sf(., to = "POINT"): repeating attributes for all sub-
## geometries for which they may not be constant

We can illustrate what this looks like. Every park has at least one sampled point, and some have several. Unfortunately there’s not a concave polygon function to help us remove all the internal points from the parks.

leaflet() %>%
  addProviderTiles(providers$Esri.WorldGrayCanvas) %>%
  addPolygons(data = open_spaces %>% st_transform(4326), color = "green") %>%
  addCircleMarkers(data = park_points %>% st_transform(4326), radius = 0.001,
                   color = "red") 

With 13512 and 2168, that suggests we need to compute 87746928 shortest paths. Even on a 20+ core process, this would take several weeks. As a result, we simplify the problem by first identifying the “closest” point of the park by Euclidean distance and then computing the network path to that point. This introduces some distortion where if the closest Euclidean point is across a river or interstate, the method calculates the network distance to that point instead of the first point the person would encounter following the network. This may end up overstating the true experienced distance.

# loop through centroids for block group or tract
for bg in df.itertuples():

    # loop through parks
    for park in park_ids:
        these_points = park_points.loc[[park]]
        min_euc = float("inf")
        
        # loop through points associated with park
        for point in these_points.itertuples():
            # find closest park point by Euclidean distance
            dx = point.LONGITUDE - bg.LONGITUDE
            dy = point.LATITUDE - bg.LATITUDE
            euc_dist = math.sqrt(dx**2 + dy**2)
            
            if euc_dist < min_euc:
                min_euc = euc_dist
                closest_point = point.node
                
        # compute shortest path between closest_point and centroid
        try:
            length = nx.shortest_path_length(graph, source=bg.node, target=closest_point, weight='length')
        except:
            # can fail if no path exists
            length = float("inf")

Calculated this way, the path is calculated in meters.

write_points_file <- function(points, file){
  points %>% 
    st_transform(4326) %>%
    mutate(
      LATITUDE = st_coordinates(.)[, 2],
      LONGITUDE = st_coordinates(.)[, 1]
    ) %>%
    st_transform(32618) %>%
    mutate(
      Y = st_coordinates(.)[, 2],
      X = st_coordinates(.)[, 1]
    ) %>%
    st_set_geometry(NULL) %>%
    write_csv(file)
}

park_points     %>% write_points_file("data/park_points.csv")
bg_centroids    %>% write_points_file("data/bg_centroids.csv")
tract_centroids %>% write_points_file("data/tract_centroids.csv")